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Abstract: We present a geometric method to determine confidence sets for the ratio 
E(Y)/E{X) of the means of random variables X and Y. This method reduces 
the problem of constructing confidence sets for the ratio of two random variables 
to the problem of constructing confidence sets for the means of one-dimensional 
random variables. It is valid in a large variety of circumstances. In the case of 
normally distributed random variables, the so constructed confidence sets coincide 
with the standard Fieller confidence sets. Generalizations of our construction lead 
to definitions of exact and conservative confidence sets for very general classes of 
distributions, provided the joint expectation of {X, Y) exists and the linear combi- 
nations of the form aX+bY are well-behaved. Finally, our geometric method allows 
to derive a very simple bootstrap approach for constructing conservative confidence 
sets for ratios which perform favorably in certain situations, in particular in the 
asymmetric heavy-tailed regime. 

1. Introduction 

In many practical applications we encounter the problem of estimating the ratio 
of two random variables X and Y. This could, for example, be the case if we want 
to know how large one quantity is relative to the other, or if we want to estimate 
at which position a regression line intersects the abscissa (e.g., Miller (1986); 
Buonaccorsi (2001); see also Pranz (submitted) for many references to practical 
studies involving ratios). While it is straightforward to construct an estimator 
for E(Y)/E{X) by dividing the two sample means of X and Y, it is not obvious 
how confidence regions for this estimator can be defined. In the case where X 
and Y are jointly normally distributed, an exact solution to this problem has 
been derived by Fieller (1932, 1940, 1944, 1954); for more detailed discussions 
see Kendall and Stuart (1961), Finney (1978), Miller (1986), and Buonaccorsi 
(2001). But in applications, practitioners often do not use Fieller's results and ap- 
ply ad-hoc methods instead. Perhaps the main reason is that Fieller's confidence 
regions do not look like " normal" confidence intervals and are often perceived as 
counter-intuitive. In benign cases they form an interval which is not symmetric 
around the estimator, while in worse cases the confidence region consists of two 
disjoint unbounded intervals, or even of the whole real line. Especially the latter 
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case is highly unusual as the confidence region does not exclude any value at all 
— certainly not what one would expect from a well-behaved confidence region. 
However, different researchers (Glescr and Hwang, 1987; Koschat, 1987; Hwang, 
1995) have shown that any method which is not able to generate such unbounded 
confidence limits for a ratio leads to arbitrary large deviations from the intended 
confidence level. For a discussion of the conditional confidence level, given that 
the Fieller confidence limits are bounded, see Buonaccorsi and Iyer (1984). 

There have been several approaches to present Fieller's methods in a more intu- 
itive way. Especially remarkable are the ones which rely on geometric arguments. 
Milliken (1982) attempted a geometric proof for Fieller's result in the case where 
X and Y are independent normally distributed random variables. Unfortunately, 
his proof contained an error which led him to the wrong conclusion that Fieller's 
confidence regions were too conservative. Later, his proof was corrected and sim- 
plified by Guiard (1989). He considers the case that X and Y are jointly normally 
distributed according to {X, Y) ~ N{iJ,, a'^V), where the mean /j, and the scale cr^ 
of the covariancc arc unknown, but the covariance matrix V is known. Guiard 
presents a geometric construction of confidence regions, and then shows by an 
elegant comparison to a likelihood ratio test that the constructed regions are ex- 
act and coincide with Fieller's solution. The drawback of his proof is that it only 
works in the case where the covariance matrix V is known, which in practice is 
usually not the case. Moreover, although the confidence sets are constructed by 
a geometric procedure, Guiard's proof relies on properties of the likelihood ratio 
test and does not give geometric insights into why the construction is correct. 

In this article we derive several simple geometric constructions for exact con- 
fidence sets for ratios. Our construction coincides with Guiard's if {X, Y) are 
normally distributed with known covariance matrix V, but it is also valid in 
the case where V is unknown. Our proof techniques are remarkably simple and 
purely geometric. The understanding gained by our approach then allows to 
extend the geometric construction from normally distributed random variables 
to more general classes of distributions. While it is relatively straightforward to 
define confidence sets for elliptically symmetric distributions, another extension 
leads to a completely new construction of confidence sets for ratios which is exact 
for a very large class of distributions. Essentially, the only assumptions we have 
to make is that the means of X and Y exist and that it is possible to construct 
exact confidence sets for the mean of linear combinations of the form aiX + a2Y . 
To our knowledge, this is the first definition of exact confidence sets for ratios 
of very general classes of distributions. Finally, using the geometric insights also 
leads to a simple bootstrap procedure for confidence sets for ratios. This method 
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is particularly well-suited for highly asymmetric and heavy-tailed distributions. 
1.1 Definitions and notation 

We will always consider the following situation. We are given a sample of n 
pairs Zi := (^i, ii)i=i,...,n drawn independently according to some underlying 
distribution. In the first part we will always assume that this joint distribution 
is a 2-dimensional normal distribution N(iJ,,C) with mean /v, = (//i,;^2) and 
covariance matrix C = ( €22) ■ We assume that both /i and C are unknown. 
Later we will also study more general classes of distributions. Our goal will be 
to estimate the ratio p := //2/a*i and construct confidence sets for this ratio. To 
estimate the unknown mean and the covariance matrix we will use the standard 
estimators: the means are estimated by 

Ai ■= - y2^i A2 := -y]^i, (1-1) 

i=l i=l 

and the estimated covariance matrix C = ( ^" ) has the entries 

\ C21 C22 / 

Sii := 7 - Ai)' and £22 := r - A2)' (1-2) 

i=l 1=1 

11" 

C12 := C21 = 7 y^(^i - Ai)(^i - ^2)- 

1=1 

Note that we rescaled the estimators clj by 1 /n to reflect the variability of the 
estimators /ti and ^2- This will be convenient later on. As estimator for the ratio 

p = P2/ Pi we use p := p2/pi- Note that our goal is to estimate E(Y) / E{X) and 
not E(Y/X). In fact, if X and Y are normally distributed, the latter quantity 
does not even exist. As in this situation the estimators Ai and A2 are normally 
distributed as well, we can also see that the estimator p cannot be unbiased, as 
its expectation E{p) = E{p2/pi) simply does not exist. For more discussion on 
the bias of the estimator p see Beale (1962); Tin (1965); Durbin (1959); Rao 
(1981); Miller (1986) and Dalabehera and Sahoo (1995). 

For a g]0, 1[, a confidence set (or confidence region) of level 1 — o; for a parameter 

^ G O is defined to be a set R constructed from the sample such that for all 
G Q it holds that Po{6 £ R) > 1 — a. If this statement holds with equality, 
then the confidence set R is called exact, otherwise it is called conservative. If the 
statement Po{9 e R) = 1 — a only holds in the limit for the sample size n — 00, 
the confidence set R is called asymptotically exact. A confidence interval [l,u\ is 
called equal-tailed if Po{6 < I) = Pe{9 > u). It is called symmetric around 6 if it 
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has the form [6 — q,6 + q\. For general background reading about confidence sets 
we refer to Chapter 20 of Kendall and Stuart (1961), Section 5.2 of Schervish 
(1995), and Chapter 4 of Shao and Tu (1995) . For a real- valued random variable 
with distribution function F and a number a g]0, 1[, the a-quantile of F is de- 
fined as the smallest number x such that F(x) = a. We will denote this quantile 
by q{F, a). In the special case where F is induced by the Student-t distribution 
with / degrees of freedom, we will denote the quantile by q{tf, a). 

Many of the geometric arguments in this paper will be based on orthogonal 
projections of the two-dimensional plane to a one-dimensional subspace. In the 
two-dimensional plane, we define the line Lp through the origin with slope p and 
the line Lp^ orthogonal to Lp by 

Lp ■■= {{x,y) eR^\y = px} 

Lp^ := {{x,y)GR''\y = {-l/p)x}. 

For an arbitrary unit vector a = (01,02)' G let 

TTa : IR^ — ^ IR, X a'x = aixi + 02X2 

be the orthogonal projection of the two-dimensional plane on the one-dimensio- 
nal subspace spanned by a, that is on the line Lr with slope r = 02/04. We will 
also write tt^ for the projection on L^, and nrj_ for the projection on the line L^x- 

Let C G IR^^^ be a covariance matrix (i.e., positive definite and symmetric) with 
eigenvectors vi,V2 G IR^ and eigenvalues Ai, A2 G IR. Consider the ellipse centered 

at some point ^ G IR^ such that its principal axes have the directions of vi , V2 
and have lengths q^/Xi and (jV^ for some q > 0. Wc denote this ellipse by 
E{C, p, q) and call it the covariance ellipse corresponding to C centered at p and 
scaled with parameter q. This ellipse can also be described as the set of points 
z G IR^ which satisfy the ellipse equation {z — p)'C~^{z — p) = q^. 

2. Exact confidence regions for normally distributed 
random variables 

Let us start with a few geometric observations. For given p = {pi, p^) G IR^, the 
ratio p = P2/ pi can be depicted as the slope of the line Lp in the two-dimensional 
plane which passes both through the origin and the point {pi, p2)- Similarly, the 
estimated ratio p is given as the slope of the line through the origin and the point 
p = {pi,p2) (cf. Figure 2.1). Assume that we are given a confidence interval 
R = [l,u\ G R that contains the estimator p. The lower and upper limits of this 
interval correspond to the slopes of the two lines passing through the origin and 
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Figure 2.1: Geometric principles. The ratio fi2/{ii can be depicted as the slope of the 
line through the points (0,0) and (/<i,/<2). The ratios inside an interval correspond 
to the slopes of all lines in the wedge spanned by the lines with slopes I and u. For a 
given wedge, the corresponding interval [l,u] can be obtained by intersecting the wedge 
with the line x = 1. 

the points (1,Z) and (l,n), respectively. Let W denote the wedge enclosed by 
those two lines. The slopes of the lines inside the wedge exactly correspond to 
the ratios inside the interval R. The other way round, the interval [l,u\ can be 
reconstructed from the wedge as the intersection of the wedge with the line x = 1 
(cf. Figure 2.1). 

2.1 Geometric construction of exact confidence sets 
In the following we want to construct an appropriate wedge containing fi such that 
the region obtained by intersection with the line x = 1 yields an exact confidence 
region for p of level 1 — a. This wedge will be constructed as the smallest wedge 
containing a certain ellipse around the estimated mean {[11,(12). We will see 
that depending on the position of the ellipse, we have to distinguish between 
three different cases called "bounded\ "exclusive unbounded^ and "completely 
unbounded\ For an illustration see Figure 2.2. 

Construction 1 (Geometric construction of exact confidence regions 
Rgeo for p in case of normal distributions) 

1. Estimate the means jli and /t2 according to Equation (1.1), the covariance 
matrix C according to Equation (1.2). 

2. Define the real number q as q{tn-i, 1 — a/2). That is, q is the (1 — (x/2)- 
quantile of the Student-t distribution with n — 1 degrees of freedom. 
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3. In the two-dimensional plane, plot the ellipse E = E{C,fi,q) centered at 
the estimated joint mean p, = (/ti, /t2); with shape according to the estimated 
covariance matrix C, and scaled by the number q computed in the step before. 

4- Depending on the position of the ellipse, distinguish between the following 
cases (see Figure 2.2). 

(a) If {0,0) not inside E, construct the two tangents to E through the ori- 
gin (0,0) and let W be the wedge enclosed by those tangents. Define 
the region Rgeo o,s the intersection of W with the line x = 1. Depending 
on whether the y-axis lies inside W or not, this results in an exclusive 
unbounded or a hounded confidence region. 

(b) If (0,0) inside E, choose the confidence region as Rgeo =] — oo,oo[ 
(completely unbounded case). 

Let us give some intuitive reasons why the three cases and the form of the con- 
fidence sets make sense. In the first case, the denominator fti is significantly 
different from 0. Here we do not expect any difficulties from dividing by fli as 
the denominator is "safely away from 0". Our uncertainty about the value of p 
is restricted to some interval around p, which corresponds to the bounded case. 
To relate this to the geometric construction, observe that the denominator is sig- 
nificantly different from if and only if the ellipse E docs not touch the y-axis. 
The situation is more complicated if the denominator is not significantly different 
from 0, that is the ellipse intersects with the y-axis. As we divide by a number 
potentially close to 0, we cannot control the absolute value of the outcome, which 
might become arbitrarily large, nor can we be sure about its sign. Hence, regions 
of the form ] — oo, ci] and [c2, oo[ should be part of the confidence region. If, ad- 
ditionally, we are confident that the numerator is not too small, then we expect 
that p is not very close to 0. This is reflected by the "exclusive unbounded case" . 
If, on the other hand, the numerator is not significantly different from 0, then we 
cannot guarantee for anything: when dividing 0/0 any outcome is conceivable. 
Here the confidence set should coincide with the whole real line, which is the 
"completely unbounded" case. 

Theorem 1 {Rgeo is an exact confidence set for p) Let (^i, 5^i)i=i,...,n be 
an i.i.d. sample drawn from the distribution N{p,C) with unknown p and C, 
and let Rgeo be the regions constructed according to Construction 1. Then Rgeo 
is an exact confidence region of level 1 — a for p, that is for all p and C we have 
P{p e Rgeo) = 1 - a. 
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Figure 2.2: The three cases in the construction of the confidence set Rgeo- the bounded 
case where the ellipse does not intersect the y-axis, the exclusive unbounded case, where 
the ellipse intersects the y-axis but does not contain the origin, and the completely 
unbounded case, where the ellipse contains the origin. 

Proof. Let a = (01,02)' G be an arbitrary unit vector. We denote by 
U := TTaiXjY) the projection of the joint random variable {X,Y) on the sub- 
space spanned by a. Then U is distributed according to N{a' fi,a'Ca). The 
independent sample points (^i, ii)i=i,...n are mapped by iTa to independent sam- 
ple points (^Ui)i=i n- It is easy to see that the length of the interval / := TTa{E) 

is 2gf(a'C'a)^/^. Taking into account the choice of the scaling factor q in Construc- 
tion 1 as the (1 — a/2)-quantile of the Student-t distribution, by the normality 
assumption on {X,Y) we can now conclude that the projected ellipse iTaiE) is a 
(1 — a)-confidence interval for the mean "Kaip) of the projected random variables: 

l-a = P (7r„(/i) G - q{a'Caf'\ TTaifi) + q{a'Cay/^]^ 

= P (TTaifi) e TTaiE)) . 

This equation is true for all unit vectors a. Now we want to consider 
the particular projection 7rpj_ on the line Lp^^ (that is, we choose a = 
(p/VTTp2,-l/yiTp2)). Showing that 7Tp^{fi) G 7:p^{E) ^ p e Rgeo 
will complete our proof. As in the construction of Rgeo we distinguish two cases. 
If the origin is not inside the ellipse E we can construct the wedge W as de- 
scribed in the construction of -Rgeo- In this case we have the following geometric 
equivalences (see Figure 2.3): 

TTp^in) G TTp^iE) <^ G TTp^iE) EnLp^d} LpCW p e R, 
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Figure 2.3: Projection of the ellipse E on the subspace spanned by p± (see proof of 
Theorem 1). 



In the second case, the origin is inside in the ellipse E. In this case it is clear 
that iTpj_(fi) = is always inside TTp^{E). On the other hand, by definition the 
region Rg^o coincides with ]— oo, oo[ in this case, and thus p G Rgeo is true. © 

2.2 Comparison to Fieller's confidence sets 

Theorem 1 shows that the confidence regions i?geo obtained by Construction 1 are 
exact confidence regions. Now we want to compare them to the classic confidence 
sets constructed by Fieller (1932, 1940, 1944, 1954). To this end let us first state 
Fieller's result according to Subsection 4, p. 176-177 of (Fieller, 1954). We 
reformulate his definition in our notation: 

Definition 2 (Fieller's confidence regions for p in case of normal dis- 
tributions) Compute the quantities 

2 ._Ai , 2 Ajcil - 2A1A2C12 + C22 , 

'iexclusive •~ - ^complete ■~ . . .2 

Cll ^ C11C22 - 
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with q as in the definition of the confidence regions Rgeo- Then define the confi- 
dence set Rpieller of level 1 — a for the ratio p as follows: 



R 



Fieiler 



00,00[ if Qlnm,r,Me. < Q'^ 



complete 

] - OO, min{/i, /2}] U [max{/i, h}, Oo[ if gLd«.*^e < 9^ < llomplete 

^ [min{/i, ^2}, maxjZi, ^2}] otherwise 



Those three cases result in completely unbounded, exclusive unbounded, and bounded 
confidence sets, respectively. 

Theorem 3 (Fieiler) Let (^i, li)i=i,...,n be an i.i.d. sample drawn from the 
distribution N(fi, C) with unknown ^ and C . Then Rpieller o,^ given in Definition 
2 is an exact confidence region of level 1 — a for p. 

Proof of Fieiler' s theorem (sketch). Consider the function 

VC22 - 2rci2 + r^cii 

where r G IR is a parameter and C denotes the sample covariance matrix. If 
applied to r = p and x = /I, the statistic T^^[fi) has a Student-t distribution 
with (n — 1) degrees of freedom. The set i?Ficllcr '•= {t £ IR| T^c^P-) ^ 
now satisfies (by the definition of q as Student-t quantile) 

Pip e i^Fieller) = P{T^ c^fi) G [-g, q]) = I - a. 

Solving —q < T /^(p) < q for r leads to a quadratic inequality whose solutions 
are given by Fieiler 's theorem. © 



Let us make a few comments about this proof. The most important property of 
the statistic T^^{p) is the fact that its distribution does not depend on p. That 
is, it is a pivotal quantity. Otherwise, solving the inequalities —q < T^ ^(ft) < q 
for r would not lead to an expression which is independent of p. Moreover, 
note that the mapping ^ projects the points on the line Lp^^ , and additionally 
scales them such that the projected sample mean has variance 1. In particular it is 
interesting to note that because ^{p) = 0, the set Jp = [T^ ^{p) —q, ^{p)-\-q\ 
is a (1 — a) confidence interval for the projected mean T^^{p): 

PiT^^M € Jp) = P(0 e [T (.(A) - q, T _^(A) + q]) = PiT^^cil^) € [-q, q]) = 1 - a. 
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This property will be used later on to generalize Fieller's confidence set to more 
general distributions. Also note that solving the inequality —q < T^^{fi) < q 
coincides with the construction of the wedge in the geometric construction. The 
wedge can be seen as exactly the lines with slope r such that the projection of p, 
on Lr-j_ is still within [—q,q\. 

Based on all those observations it is very natural to expect a close relation be- 
tween -RFicUcr and i?gco- Still, a priori it is not clear that those two confidence 
sets coincide, as confidence sets are not necessarily unique. But the following 
theorem proves that this is indeed the case: 

Theorem 4 {Rgeo and -Rpieiier coincide) The confidence region Rgeo defined 
in Construction 1 coincides with Rpieiier as given in Definition 2. 

Proof. (Sketch) First one has to show that the three cases in Fieller's theorem 
coincide with the three cases in the geometric approach. Second, one then has 
to verify that the numbers h and I2 in Fieller's theorem coincide with the slopes 
of the tangents to the ellipse. Both steps can be solved by straightforward but 
lengthy calculations. Details can be found in von Luxburg and Franz (2004). © 

Note that in the proof of Fieller's theorem we did not directly use the fact that 
we have paired samples (^i, ii)i=i,...,n- Indeed, Fieller's theorem and its proof 
can also be valid in the more general setting where we are given two independent 
samples Xi, ...,Xn and Yi, ...,Y„i with a different number of sample points, and 
use unbiased estimators for the means //i, 1^2 and independent unbiased estima- 
tors for the (co)variances clj. In this case one has to take care to choose the 
degrees of freedom in the Studcnt-t-distribution appropriately, see Buonaccorsi 
(2001) and Section 3.3.3 of Rencher (1998). 

3. Exact confidence sets for general random variables 

In this section we show how to extend our geometric approach to non-normally 

distributed random variables. While it is straightforward to extend our geomet- 
ric approach to elliptically symmetric distributions, re-interpreting the geometric 
construction also leads to a new construction for more general circumstances. 

3.1 Elliptically symmetric distributions 

In the normally distributed case, the main reason why Construction 1 leads to 
exact confidence sets is that the projected and studentized mean is Student-t 
distributed, no matter in which direction we project. More generally, such a 
property holds for all elliptically symmetric random variables. Elliptically sym- 
metric random variables can be written in the form -|- AY where /x is a shift 
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Figure 3.1: Second geometric interpretation: By definition, ratio r is element of Fieller's 
confidence set i?geo if the line Lr (depicted by the little arrow) is inside the wedge 
enclosing the covariance ellipse. This is the case if and only if the origin is inside the 
projection := tt^^ {E) of the ellipse on the line L^^. The left panel shows a case where 
r e i?geo> the right panel a case where r ^ -Rgeo- 

parameter, A is a matrix with AA' = C, and Y any spherically symmetric ran- 
dom variable generated by some distribution H on IR_|_. For a brief overview of 
spherical and elliptical distributions see Eaton (1981), for an extensive treatment 
see Fang, Kotz, and Ng (1990). In particular, if X is an elliptically symmetric 
random variable with shift fi, covariance C, and generator H, then the statistic 
T^ij{jl) introduced in Equation (2.1) is a pivotal quantity which has the same 
distribution for all r G R. Denote the distribution function of this statistic by G. 
To extend Construction 1 to the case of elHptically symmetric distributions, all 
we have to do is to define the quantile q in Construction 1 or Definition 2 by the 
quantile q{G^ 1 — a/2) of the distribution G. With similar arguments as in the 
last sections one can see that the resulting confidence set is exact. 

3.2 Confidence sets for a very general class of distributions 

Once we leave the class of elliptically symmetric distributions, the distributions of 
the projected means arc no longer independent of the direction of the projection, 
and all the techniques presented above cannot be used any more. However, there 
is a surprisingly simple way to circumvent this problem. To see this, let us 
re-interpret Construction 1 as depicted in Figure 3.1. Previously, to determine 
whether r G IR should be element of Rgeo we checked whether the line with slope 
r is inside the wedge enclosing the ellipse E. But note that the same result can 
be achieved if we project the sample on the line L^^ , construct a one-dimensional 
confidence set Jr for the mean on L^^, and check whether G or not. This 
observation is the key to the following construction: 
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Construction 2 (Exact confidence sets Rgen for P in c£ise of general 
distributions) 

1. For each r G R, project the sample points on Lrj_, that is define the new 
points Ur^i = TTr^ {Xi, Yi) , i = 1, . . . ,n. 

2. For each r G K, construct a confidence set Jr for the mean of Ur,i, that is a 
set such that P{'Krj^{tj) G Jr) = 1 — a. 

3. Then define the confidence set Rgen for p as Rgen = {r € R | € Jr}. 

The big advantage of this construction is that the projection in direction of the 
true value p is not singled out as a "special" projection, we simply look at all 
projections. Hence, Construction 2 does not require any knowledge about p. 

Theorem 5 (i?gen is an exact confidence set for p) Let {X^, li)i=i,...,„ G 
be i. i. d. pairs of random variables with arbitrary distribution such that the joint 
mean of {X, Y) exists. If the confidence sets Jr used in Construction 2 exist and 

are exact (resp. conservative resp. liberal) confidence sets of level (1 — a) for 
the means o/7rr^((Xj, yi))j=i^,,.^„, then Rgen is an exact (resp. conservative resp. 
liberal) confidence set for p. 

Proof. In the exact case, wc have to prove that the true ratio p satisfies 
P{p G -Rgen) = (1 — a). By definition of Rgen, for each r G IR we have that 
r G -Rgen e Jr. In particular, this also holds for r = p. Moreover, the 

projection corresponding to the true ratio p projects the true mean p on the 
origin of the coordinate system. By linearity, the projection of the true mean 
7rpj_{p) equals the mean of the projected random variables. By construction of 
Jr we know that the latter is inside Jr with probability exactly (1 — a). So we 
can conclude that P{p G i?gen) = -P(0 G Jp) = P{Trpj^{p,) G Jp) = 1 — a. © 

We proved that the set -Rgen defined in Theorem 5 is an exact confidence set 
for the ratio of random variables. The only assumptions arc that the means of 
X and Y exist and that there is a rule to compute exact confidence intervals 
for the means of the projections TTrj_{X,Y). To our knowledge. Construction 
2 is the first construction of exact confidence sets for general distributions. It 
reduces the difficult problem of estimating confidence sets for the ratio of two 
random variables to the problem of estimating confidence sets for the means of 
one-dimensional random variables. On a first glance this looks very promising. 
However, the crux for applying this construction in practice is that one has to 
know the analytic form of the distribution of the projected means. For this one 
has to be able to derive an analytic expression for general linear combinations 
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of X and Y. While there might be some special cases in which this is tractable, 
for the vast majority of distributions such an analytic form is not easy to obtain. 
As a consequence, while being of theoretic interest. Construction 2 is of limited 
relevance for practical applications. 

4. Conservative confidence sets for more general ran- 
dom variables 

Our geometric principles can also be used to derive very simple conservative con- 
fidence sets for general distributions. The main idea is to replace the ellipse used 
in Construction 1 by a more general convex set M c K^. A straightforward idea 
is choose M as a (1 — a)-confidence set for the bivariate joint mean G IR^, that 
is a set such that G M) = 1 — a. Then, as above we can construct the 

wedge W around M which is given by the two enclosing tangents and choose 
a confidence region i^cons by intersecting the wedge with the line x = 1, distin- 
guishing between the same three cases as above. For general distributions, there 
exists a simple but effective way to choose the convex set M. Namely, we take 
the axis-parallel rectangle A := Ii x I2, where the intervals I\ := [li,ui] and 
I2 '■= [h,U2] are confidence intervals for the one-dimensional means f^i of X and 
H2 of y. Formally, the construction is the following: 

Construction 3 (Geometric construction of conservative confidence re- 
gions i?cons for p for general distributions) 

1. Construct exact confidence intervals Ii and I2 of level (1 — a/2) for the 
means of X and Y, respectively. In the two-dimensional plane, define the 
rectangle A = Ii x I2. 

2. (a) If (0,0) is not inside A, construct the two tangents to A through the 

origin (0, 0), and let W be the wedge enclosed by those tangents. Define 
the confidence region Rcons o,s the intersection of W with the line x = 1. 
Depending on whether the y-axis lies inside W or not this results in an 
exclusive unbounded or a bounded confidence region 

(b) If (0,0) inside A, choose the confidence region as Rcons — ] — ^) 00 [. 

Theorem 6 (-Rcons is a conservative confidence set for p) Let 
(Xj, yj)j=i^..._„ G be i.i.d. pairs of random variables with arbitrary dis- 
tribution such that the joint mean of {X, Y) exists. If the confidence sets Ii and 
I2 used in Construction 3 exist and are exact or conservative confidence sets of 
level (1 — a) for the means of X and Y , then Rcons is a conservative confidence 
set for p of level (1 — 2a). 



14 



ULRIKE VON LUXBURG AND VOLKER H. FRANZ 



The proof of this theorem is nearly trivial and can be given in two lines: 

P{p G ^cons) = P{li &W)> P{fi eA) = P{fii G h and ^2 e h) 

= 1 - P{ni h or M2 /2) > 1 - A) + P{l^2 /2)) = 1 - 2a. © 

Interestingly, it can be seen easily that the set i?cons constructed using the rectan- 
gle coincides with the set obtained by "dividing" the one-dimensional confidence 
intervals I2 by Ii, namely -Rcons = h/h ■■= {|; y £ l2,x e h} . The latter is 
a heuristic for confidence sets for ratios which can sometimes be found in the 
literature, usually without any theoretical justification. Our geometric method 
now reveals effortlessly that it is statistically safe to use this heuristic, but that 
it will lead to conservative confidence sets of level 1 — 2a. 

Of course, one could think of even more general ways to construct a convex set 
AI C IR^ as base for the conservative geometric construction. For example, in- 
stead of using axis-parallel projections as in Construction 3, one could base the 
convex set M on projections in arbitrary directions (for example, using the two 
projections in direction of p and p±, or even using more than two projections). 
However, we would like to stress one big advantage of using the axis-parallel 
rectangle. While the exact generalizations presented in Section 3 require to con- 
struct confidence sets for the means of arbitrary linear combinations of the form 
aX + bY, for the rectangle construction we only need to be able to construct ex- 
act confidence sets for the marginal distributions of X and Y, respectively. One 
can envisage many situations where distributional assumptions on X and Y are 
reasonable, but where the distributions of projections of the form aX + bY cannot 
be computed in closed form. In such a situation, the rectangle construction can 
serve as an easy loophole. The prize we pay is the one of obtaining conservative 
confidence sets for the ratio instead of exact ones. But in many cases, obtaining 
confidence sets which are provably conservative might be preferred over using 
heuristics with unknown guarantees to approximate exact confidence sets. 

5. Bootstrap confidence sets 

In the last sections we have seen how exact and conservative confidence sets for 
ratios of very general classes of distributions can be constructed. In practice, 
the application of those methods is limited by the problem that we still need 
strong assumptions to apply them: we need to know the exact distributions of 
the projections of {X, Y). In this section we want to investigate how approximate 
confidence sets can be constructed in cases where the underlying distributions 
are unknown. A natural candidate to construct approximate confidence sets for 
ratios are bootstrap procedures (e.g., Efron, 1979; Efron and Tibshirani, 1993; 
Shao and Tu, 1995; Davison and Hinkley, 1997). However, if the variance of 
the statistics of interest does not exist, as is usually the case for p, bootstrap 
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confidence regions can be erroneous (Athreya, 1987; Knight, 1989). Moreover, 
standard bootstrap methods which attempt to bootstrap the statistic p directly 
cannot result in unbounded confidence regions. This is problematic, as it has 
been shown that any method which is not able to generate unbounded confi- 
dence limits for a ratio can lead to arbitrary large deviations from the intended 
confidence level (Gleser and Hwang, 1987; Koschat, 1987; Hwang, 1995). Hence, 
bootstrapping p directly is not an option. Instead, in the literature there are 
several approaches to use bootstrap methods based on the studentized statis- 
tic T^^{jl) introduced in Equation (2.1). A simple approach along those lines 
is taken in Choquet, L'Ecuyer, and Leger (1999). The authors use standard 
bootstrap methods to construct a confidence interval [gl,g2] for the mean of 
the statistic T~^{jl). As confidence set for the ratio, they then use the interval 
[p ~ Q2Sp, p — qiSp] where Sp is the estimated standard deviation of p. However, 
this approach is problematic: the confidence sets do not have the qualitative be- 
havior as the Fieller ones, and as they are always finite, the coverage probability 
can be arbitrarily small. 

5.1 Bootstrap approach by Hwang and its geometric interpretation 

A more promising bootstrap approach for ratios has been presented by Hwang 
(1995). He suggests to use standard bootstrap methods to construct confidence 
sets for the mean of Tg (j(/i). To determine the confidence set for the ratio, he then 
proceeds as Fieller and solves a quadratic equation to determine the confidence 
set for the ratio. Hwang (1995) argues that his confidence sets are advantageous 
when dealing with asymmetric distributions such as exponential distributions. 
However, we need to be careful here. Hwang (1995) only treats the case of one- 
sided confidence sets, where he constructs a confidence set of the form ] — oo, g] 
for T~^{jl) and then solves the quadratic equation T~^{fi)'^ < q^. This leads 
to the three well-known cases bounded, exclusively unbounded, completely un- 
bounded. However, the two-sided case is more involved and is not discussed in 
his paper. If one uses symmetric bootstrap confidence sets of the form [— g, q\ 
for Tg^(/i), then one can proceed by solving one quadratic inequality similar 
to above. However, if one wants to exploit the fact that the distribution might 
not be symmetric, one would have to use asymmetric (for example equal-tailed) 
confidence sets of the form [gl,g2] for T. ^(/t). But then, solving the equations 
ql < Tg^(/i) < q2 can lead to unpleasant effects. To satisfy both inequalities 
simultaneously, one has to solve two different quadratic inequalities. The joint 
solution can not only attain the three Fieller types, but all possible intersections 
of two Fieller type sets. For example, one can obtain confidence sets for the ratio 
which are only unbounded on one side, such as] — oo,l][J[l',u\. Such confidence 
sets are quite implausible: as we discussed after Construction 1, in cases where 
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the denominator is not significantly different from the confidence set should be 
unbounded on both ends. Otherwise, the confidence set of the ratio would reflect 
a certainty about the sign of the denominator that is not present in the confidence 
set of the denominator itself. Consequently, we believe that Hwang's approach 
should only be used with symmetric (and not with equal-tailed) confidence sets 
for T~^{jl). In this case, Hwang's bootstrap approach can easily be interpreted 
in our geometric approach and is in fact very similar to Fieller's approach: as in 
Construction 1, one forms the covariance ellipse centered at fi using the estimated 
covariance matrix C. But instead of using quantiles of the Student-t distribution 
to determine the width q of the ellipse, one now uses bootstrap quantiles for this 
purpose. Then one proceeds exactly as in the Fieller case. This geometric inter- 
pretation reveals that Hwang's approach relies on one crucial assumption on the 
distribution of the sample means: their covariance structure has to be elliptical. 
So while seeming distribution-free at first glance, Hwang's bootstrap approach 
with symmetric confidence sets relies on the implicit assumption that the sample 
mean is elliptically distributed. Below we will illustrate some consequences of 
this insight in simulations. 

5.2 A geometric bootstrap approach 

We now want to suggest a bootstrap approach which potentially is more suited to 
deal with highly asymmetric distributions. To this end, we will adapt the geomet- 
ric Construction 3 to a bootstrap setting. This can be done in a straightforward 
manner: we simply use bootstrap methods to construct the one-dimensional con- 
fidence intervals h and h used in Construction 3, and then proceed exactly as 
in Construction 3. The advantage of this approach is obvious: we do not need 
to make any assumptions on the distribution, can easily use asymmetric confi- 
dence intervals Ii and I2, and still obtain a Fieller- type behavior (as opposed 
to Hwang's method, which does not have this behavior when using asymmetric 
bootstrap sets) . Moreover, our construction does not assume elliptical covariance 
structure, and can, for example, be used for heavy-tailed distributions which are 
not in the domain of attraction of the normal law. In this sense, the geometric 
bootstrap approach can be applied in situations where both Fieller's and Hwang's 
confidence sets fail. This will be demonstrated below. 

Note that one can easily come up with other, more involved bootstrap methods 
based on the geometric method. For example, one can use more than two pro- 
jections, one can use projections which are not parallel to the coordinate axes, 
or one can even base the wedge on more general two-dimensional convex sets in 
the plane. A completely different approach can be based on bootstrapping polar 
representations of the data (along the lines of Koschat, 1987). However, given 
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that in our simulations those methods did not perform better than the existing 
methods we will not discuss those approaches in detail. 

5.3 Simulation study 

In this section we would like to present some numerical simulations to com- 
pare the bootstrap approach by Hwang, our geometric bootstrap approach, and 

Fiellcr's standard confidence set. 

Setup. For both X and Y we use three different types of distributions: 
Normal distributions. Here we always fixed the mean to 1 and varied the vari- 
ance between 0.1 and 10. 

Exponential distributions. They are highly asymmetric, but still in the domain 
of attraction of the normal law. Here we varied the mean between 0.1 and 10. 
Pareto distributions with density function p{x) = afc^/x""^^, cf. Chapter 20 of 
Johnson, Kotz, and Balakrishnan (1994). For a Pareto(k,a) distributed random 
variable, all moments of order larger than a exist, the smaller moments do not 
exist. In particular, for a g]1,2[, the expectation exists, but the variance does 
not exist. In this case the distribution is heavy-tailed and not in the domain of 
attraction of the normal law. In our experiments, we varied the tail parameter 
a between 1.1 and 2.5 and always chose parameter k such that the expectation 
is 1 (that is, wc chose k = {a — l)/a). For some simulations wc also used an in- 
verted Pareto distribution (a Pareto distribution which has been fiipped around 
its mean, so that its tail goes in the negative direction). 

For each fixed distribution of X and Y, we independently sampled n = 20 

(n = 100, n = 1000, respectively) data points Xi and Yi. Then we computed 
the Fieller confidence set according to Definition 2, our geometric bootstrap con- 
fidence sets as introduced above, and Hwang's bootstrap confidence sets. Each 
simulation was repeated R = 1000 times to compute the empirical coverage. As 
nominal coverage probability we always chose 90% (in terms of coverage, this is 
more meaningful than the level 95% as it leaves more room for deviations in both 
directions). To construct the bootstrap confidence sets for the one-dimensional 
means of X and Y (in the geometric method) and the projection T^^{jj,) (in 
Hwang's method) wc used different bootstrap methods. As default bootstrap 
method wc used bootstrap-t (cf. Efron and Tibshirani, 1993). We also tried 
several other standard methods such as the percentile or the bias corrected and 
accelerated (BCA) method (cf. Efron and Tibshirani, 1993), but did not ob- 
serve qualitatively different behavior. To deal with heavy-tailed distributions, 
we applied methods based on subsanipling self-normalizing sums, as introduced 
by Hall and LePage (1996), see also Romano and Wolf (1999). Here one has 
to choose one parameter, namely the size m of the subsamples. We did not 
use any automatic method to optimize this parameter, but based on values re- 
ported in Romano and Wolf (1999) we fixed it to m = 10 (40,400) for n = 20 
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(100, 1000). For all bootstrap methods, we tried both equal-tailed and symmetric 
confidence sets, in all cases with B = 2000 bootstrap samples. We will report 

the bootstrap results using notations such as Hwang(symmetric, bootstrap-t) 
or Geometric (equal-tailed, Hall). The terms in parentheses always refer to the 
construction of the confidence sets for the respective one-dimensional projections. 

Evaluation. In all settings we evaluated the empirical coverage (see Fig- 
ure 5.1) and the number of bounded confidence sets (see Figure 5.2). Due to 
space constraints we cannot show the results for all parameter settings in detail. 
Many more figures can be found in the supplementary material to this paper 
(von Luxburg and Franz, 2007). 

Coverage properties in case of finite variance. Wc start with the case where 
both X and Y are normally distributed (Figure 5.1, first row). Here FicUcr's 
confidence set is exact, and indeed we can see that it achieves very good cov- 
erage values. In terms of absolute deviation from the nominal confidence level, 
Hwang performs comparably to Fieller. The difference is that Fieller tends to 
be slightly conservative, while Hwang tends to be slightly liberal. As predicted, 
the geometric method is conservative and achieves higher than nominal coverage. 
For all three methods, the results based on different sample sizes and different 
bootstrap constructions are qualitatively very similar (see supplement) . 
To investigate the effect of symmetry, we consider the case where one of the 
random variables is exponentially distributed and thus highly asymmetric (Fig- 
ure 5.1, second row). We can see that qualitatively, the three procedures behave 
as described above (Fieller slightly conservative, Hwang slightly liberal, geomet- 
ric conservative), even for a small sample size n = 20 (results for larger n are 
similar, see supplement). The fact that the original distribution was asymmetric 
seems not to have much impact on the results. 

Coverage properties in heavy-tailed regime. The general picture changes dramat- 
ically if we investigate the case of heavy-tailed distributions. Here we consider 
simulations with X '^Parcto, Y ~ Paretoinverted. The reason for using the in- 
verted Pareto distribution for Y (instead of the "standard" one) is that we want 
to study a general asymmetric case — the distribution of the projections on Lp^ 
would be perfectly symmetric in case where both X and Y are generated ac- 
cording to the same distribution. Results for X, y ~ Pareto can be found in the 
supplement. In Figure 5.1, third row, we can see that for the heavy-tailed param- 
eters a < 2, both Fieller's and Hwang's confidence sets fail completely and lead 
to empirical coverage probabilities below 0.20 instead of 0.90. For Hwang, the 
happens no matter what bootstrap method we use (symmetric or equal-tailed, 
bootstrap-t or Hall), see Figure 5.1, rows three to five and supplement. The 
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method Geometric(equal-tailed, Hall), on the other hand, performs much better 
than both Fieller's and Hwang's methods in the heavy-tailed regime a < 2. The 
overall coverage of the geometric method never drops below 0.70, a dramatic 
improvement over the other two methods. It is interesting to observe that the 
good performance of the geometric method in the heavy-tailed regime decreases 
massively if we use bootstrap-t instead of Hall's bootstrap intervals (Figure 5.1, 
fifth row). The reason is that in the heavy-tailed case, bootstrap-t does not 
achieve good coverage for the onc-dimcnsional projections, and then of course 
the coverage of the final confidence intervals suffers as well. Finally, when the 
Pareto tail parameter moves in the region a > 2, we are again in the domain of 
attraction of the normal law. Here all results resemble again the ones already 
reported for the finite variance case. 

Interpretation of the results in terms of projections. The quality of all three 
methods crucially depends on the quality of the one-dimensional confidence sets 
under consideration. For distributions in the domain of attraction of the normal 
law, Fieller's confidence sets perform very well, even for highly asymmetric dis- 
tributions. The reason is that even for small sample sizes, the distribution of the 
sample means is already so close to normal that using bootstrap does not lead to 
any advantage over using a normal distribution assumption. In the heavy-tailed 
regime, both Hwang and Fieller fail. This is the case because both of them do 
not achieve good coverage probabilities for the projected one-dimensional ran- 
dom variables T~^{jl) in the first place. Here the geometric method has a big 
advantage over the other two methods, because instead of considering projections 
in arbitrary directions we only have to deal with projections on the coordinate 
axes. The fact that the coverage of the one-dimensional confidence sets on the 
projections is an important indicator for the quality of the confidence set for 
the ratio can also observed from the fact that the coverage of 0.70 achieved by 
Geometric(Hall) (Figure 5.1, rows three and four) is in accordance with values 
reported by Romano and Wolf (1999) for the coverage of confidence sets for the 
mean of Pareto distributions. 

Number of hounded confidence sets. In Figure 5.2 we compare the number of 
bounded confidence sets for the three methods. Often, those numbers do not 

differ too much across the different methods. In some cases, Geometric(equal- 
tailed) performs favorably in that it has more bounded confidence sets than the 
other methods (see supplement for more figures). In the asymmetric heavy-tailed 
case it can be seen that when using symmetric rather than equal-tailed confidence 
sets in the geometric method, the number of bounded confidence sets decreases 
heavily (compare third and fourth row of Figure 5.2). This is due to the fact 
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that the one-dimensional confidence sets then become very large in both direc- 
tions (whereas the equal-tailed ones are only large in one direction). Hence, the 

origin is contained in the resulting rectangle much more often, which then leads 
to unbounded confidence sets. This strongly speaks in favor of using equal-tailed 
bootstrap confidence sets rather than symmetric ones in the geometric method. 
Note that for Hwang's method, using equal-tailed confidence sets can lead to 
implausible confidence sets which arc unbounded on one side, but bounded on 
the other side (as explained above). In our experiments, such confidence sets 
indeed did occur, but not very often (about 20 times out of 1000 repetitions). 

Summary. The geometric approach to confidence sets for ratios shows that con- 
fidence sets for ratios can be derived from one-dimensional confidence sets for the 
mean of projections of {X, Y). Of course, the quality of the ratio confidence sets 
crucially depends on the quality of those one-dimensional confidence sets. Based 
on our experiments, we would like to give the following advice. For distributions 
which are in the domain of attraction of the normal law, we recommend to use 
Fieller's confidence set instead of using any bootstrap method. Here, Fieller's set 
works fine even for small sample size and in asymmetric distributions. Hwang's 
set achieves comparable results in terms of absolute deviation, but as opposed to 
Fieller's sets its deviations tend to be to the liberal side, which should be avoided 
in our opinion. For asymmetric heavy-tailed distributions we recommend to use 
our Geometric(equal-tailed, Hall) method. This method can be seen as a natural 
generalization of the geometric interpretation of the Fieller method to a boot- 
strap scenario. Even though it does not work perfect, its coverage outperforms 
Fieller's and Hwang's methods by a large margin, and the number of bounded 
confidence sets is often higher than for Fieller or Hwang. The performance of 
the geometric method of course depends on the performance of the bootstrap 
method used for the one- dimensional distributions. If one is able to improve 
the bootstrap intervals for the mean of those distributions, one is very likely to 
further improve the coverage of the geometric confidence sets for the ratio. 
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Empirical coverage: X ~ normal, Y ~ normal, n=100, nominal level 0.90 
Geometric (symmetric, bootstrap-t) Hwang (symmetric, bootstrap-t) Fieller 
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Empirical coverage: X ~ pareto, Y ~ paretoinverted, n=100, nominal level 0.90 
Geometric (equal-tailed. Hall) Hwang (equal-tailed. Hall) Fieller 
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Empirical coverage: X ~ pareto, Y ~ paretoinverted, n=100, nominal level 0.90 
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Empirical coverage: X ~ pareto, Y ~ paretoinverted, n=1000, nominal level 0.90 



Geometric (equal-tailed, bootstrap-t) Hwang (equal-tailed, bootstrap-t) Fieller 
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Figure 5.1: Empirical coverage. Each row corresponds to one fixed set of parameters, and 
shows the empirical coverage of the three methods. The nominal confidence level 0.90 
is always depicted in yellow, red colors depict conservative and green/blue colors liberal 
confidence sets. The color scales are constant within each row, but change between the 
rows. 
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Percentage of bounded confidence sets: X ~ normal, Y ~ normal, n=100, nominal level 0.90 
Geometric (symmetric, bootstrap-t) Hwang (symmetric, bootstrap-t) Fieller 
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Percentage of bounded confidence sets: X ~ exponential, Y ~ normal, n=20, nominal level 0.90 
Geometric (equal-tailed, bootstrap-t) Hwang (equal-tailed, bootstrap-t) Fieller 
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Figure 5.2: Percentage of bounded confidence sets (over 1000 simulations). Each row 
corresponds to one fixed set of parameters, and shows the percentage of bounded con- 
fidence sets for the three methods. The color scales are constant within each row, but 
change between the rows. 



